Improving spam filtering by combining Naive Bayes with simple k-nearest neighbor searches

نویسنده

  • Daniel Etzold
چکیده

Using naive Bayes for email classification has become very popular within the last few months. They are quite easy to implement and very efficient. In this paper we want to present empirical results of email classification using a combination of naive Bayes and k-nearest neighbor searches. Using this technique we show that the accuracy of a Bayes filter can be improved slightly for a high number of features and significantly for a small number of features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method in Scam Detection and Prevention using Data Mining Approaches

Scam’ is a fraudulence message by criminal intent sent to internet user mailboxes. Many approaches have been proposed to filter out unsolicited messages known as ‘spam’ from legitimate messages known as ‘ham’. However up to this date no suitable approach has been proposed to detect Scams. Almost all spam filters which use Machine Learning approaches, classify scams as hams when scam messages ar...

متن کامل

A Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure

E-mail is the most prevalent methods for correspondence because of its availability, quick message exchange and low sending cost. Spam mail appears as a serious issue influencing this application today's internet. Spam may contain suspicious URL’s, or may ask for financial information as money exchange information or credit card details. Here comes the scope of filtering spam from legitimate em...

متن کامل

Generating Estimates of Classification Confidence for a Case-Based Spam Filter

Producing estimates of classification confidence is surprisingly difficult. One might expect that classifiers that can produce numeric classification scores (e.g. k-Nearest Neighbour or Naive Bayes) could readily produce confidence estimates based on thresholds. In fact, this proves not to be the case, probably because these are not probabilistic classifiers in the strict sense. The numeric sco...

متن کامل

An evaluation of Naive Bayes variants in content-based learning for spam filtering

We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two current variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two variants of Naive ...

متن کامل

Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques

Spam is one of the main problems in emails communications. As the volume of non-english language spam increases, little work is done in this area. For example, in Arab world users receive spam written mostly in arabic, english or mixed Arabic and english. To filter this kind of messages, this research applied several machine learning techniques. Many researchers have used machine learning techn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cs.LG/0312004  شماره 

صفحات  -

تاریخ انتشار 2003